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Abstract 

In this paper we consider the concept of ‘closeness’ between nodes in a weighted 
network that can be defined topologically even in the absence of a metric. The Gener¬ 
alized Erdos Numbers (GENs) [T] satisfy a number of desirable properties as a measure 
of topological closeness when nodes share a finite resource between nodes as they are 
real-valued and non-local, and can be used to create an asymmetric matrix of con¬ 
nectivities. We show that they can be used to define a personalized measure of the 
importance of nodes in a network with a natural interpretation that leads to a new 
global measure of centrality and is highly correlated with Page Rank. The relative 
asymmetry of the GENs (due to their non-metric definition) is linked also to the asym¬ 
metry in the mean first passage time between nodes in a random walk, and we use a 
linearized form of the GENs to develop a continuum model for ‘closeness’ in spatial 
networks. As an example of their practicality, we deploy them to characterize the 
structure of static networks and show how it relates to dynamics on networks in such 
situations as the spread of an epidemic. 

The study of complex networks has increased enormously in recent years due to their 
applicability to a wide range of physical [SJ [3], biological @j, epidemiological (5J [6j, and 
sociological[7] systems. Two basic goals in this regard are to understand and quantify 
the structure of the network to better characterize the relationship between the interacting 
members of the network (the nodes), while also characterizing the dynamical processes on 
the network [7] that may shed light on the processes by which they form [Sj. 

Understanding the topological properties of the network on both a global and local level 
can be useful in approaching both of these goals. Global properties of interest may include 
simple measures of the distribution of node properties, such as the degree distribution, 
Pd(k ), with k the number of edges incident upon a node; strength distribution, P s (s), 
with s the total weight incident upon a node; or distribution of clustering coefficients, 
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P c (c), with c the fraction of triplets of connected nodes that are closed[9, TO] . Community 
structure in the network [IT. fT~2l Il3| . which partitions the network into densely connected 
sub-networks with more links within communities than between communities, has been 
extensively studied and may provide more detailed information about the relationship 
between nodes than simple distributions. Community structure in networks can indicate 
the existence of underlying similarities between nodes in the network, and may have a great 
impact on dynamical processes occurring on the network (such as a random walk [Hi. 11151116] 
or epidemic spreading^ (T7J .18]), and can influence the material properties of granular 
systems [2]. 

While global properties of networks can be used to assess the attributes of the nodes on an 
aggregate level, it is also of great interest to understand the topological properties of nodes 
on an individual, local level as well. Node centrality is the classic example of a topological 
measure of an individual node, which assesses the ‘importance’ of a node in a variety of 
contexts. The most basic measure of a node’s centrality is simply related to it’s degree, 
a property of the node that is based solely on the local topology of its connectivity. The 
centrality of individual nodes can also be measured incorporating the global topology of the 
network in a variety of ways, including PageRankjl9]. betweenness] 16] . or random walk [ITT] 
centralities. Each of these measures reduces the global properties of the network into a 
individualized local measure of importance, permitting a rank-ordering of their importance 
in the network [2D, 121]. 

In many contexts[22, 23] . not all members of the network will necessarily agree on the 
importance of the same node: nodes that have a direct connection between them will be 
more important to each other than distant nodes in the network. Nodes that are central 
to the network as a whole may have very low importance from the perspective of sub¬ 
networks. The universality of importance is further complicated by the fact that we may 
expect the influence between a pair of nodes to be asymmetric even if they are directly 
connected[23] (the importance assigned by an important node towards an unimportant 
one is not necessarily the same as the importance assigned in the opposite direction), 
which may have important consequences in real-world systems [I]. The determination of 
a personalized measure of node importance that incorporates the global topology in an 
asymmetric measure is therefore an important but non-trivial problem. 

In this paper, we use the recently developed Generalized Erdos Numbers pi, [T2] (GENs), 
which provide a nonlinear asymmetric measure of the pairwise ‘closeness’ between nodes 
based on the network’s global topology, as a proxy for the importance one node feels to¬ 
wards another. We show that this measure of closeness can be linked to the dynamics of 
random walks and spreading of epidemics on networks. Using the GENs, we develop a lin¬ 
earized importance that provides an intuitive measure of the (asymmetric) importance one 
node feels towards another based on the global network topology. This linearized impor¬ 
tance is considered in a continuum approximation for a spatial network with homogeneous 
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Figure 1: Two competing requirements for global ‘closeness’ in a network with shared 
resources. In (A), many short paths between nodes increase the closeness between them. 
This is similar to the resistance distance between nodes: additional parallel paths between 
them reduce their resistance distance. In (B), the finite resources of the high-degree blue 
node suggest it should be less close to the red node than for the lower-degree blue node 
above, as resources are shared with the other neighbors as well. This is similar to the 
transition probability from the blue node in a random walk: the more connections the blue 
node has, the lower probability of visiting the red node. 


distance-dependent links between material points on a sphere which is coupled to long-range 
distance-independent connection strengths, and is shown to reduce to an inhomogeneous 
Fredholm equation with a kernel depending on the topology of the spatially-independent 
portion of the network. Finally, we show that a global measure of centrality defined using 
the GENs is consistent with other centrality measures, and that the asymmetry in the 
GENs is consistent with the asymmetry of the mean first passage time (MFPT) between 
nodes in a random walk. The flexibility of the GENs in understanding the topology of 
complex networks shows their utility in a wide range of problems. 


1 The Generalized Erdos Numbers 

When nodes represent objects in a physical space[2Tj 123 S3 S, 122], the distance between 
nodes, Djj , is a naturally defined (metric) measure of closeness between the objects. Due 
to the generality of networks (where nodes and edges abstractly represent ‘objects’ and 
‘interactions,’ respectively), there can be no guarantee of a naturally defined distance 
metric HSUS], and, in some cases, the network topology itself must define a measure of 
closeness (A,^) based solely on the matrix of weights between nodes i and j, Wij (with an 
undirected network where = Wji is assumed throughout this paper). The closeness, 
A ij, will be small for nodes that are close to one another and large for distant nodes, with 
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a simple and common choice being A^ = w~^ (so strongly connected nodes are ‘close’, 
and disconnected nodes are ‘far’). Alternatively, in an unweighted network, the length of 
the shortest path between a pair of nodes is a natural definition [28j [29] and is the basis for 
the classic Erdos numbers in the context of an unweighted collaboration network|30]. Im¬ 
provements on this measure which incorporate the effect of multiple paths between nodes 
(see Fig. [l](A) for a schematic diagram) include the resistance distance [131 31.] , self consis¬ 
tent similarity measures|32). and communicability [33], to name only a few. An additional 
approach to defining similarity between nodes is found by positing a multidimensional ‘la¬ 
tent space’ of node properties [SI], with the assumption that nodes that are close in the 
latent space are likely to be connected in the network and each node’s position in the space 
inferred from the observed connectivity. Each of these methods incorporates the global 
topology of the network into a symmetric measure of closeness between pairs of nodes 

(A ij = A ji). 

Finite resources are shared in some networks, with examples including collaboration on 
networks (where time with one collaborator reduces the available time for others), multi¬ 
core processor components[35] (where finite memory or other hardware must be shared), 
and random walks (where the walker can only move to a single neighbor at a time with a 
transition probability Pi^j = Wij/Wi with Wi = Yhk w ik the total strength of the node i). 
In the context of these networks of limited resources, closeness measures such as resistance 
distance may be undesirablej23], because the addition of a new edge in the network should 
be detrimental to some nodes (those who receive less of the finite resource due to the new 
edge) and beneficial to others (those who receive more due to the edge). For closeness 
measures based on the direct weight between nodes (where the ‘closeness’ between i and 
j is often taken to be w^ 1 ) or resistance distance between nodes, it is straightforward to 

see that the newly measured closeness between nodes i and j A ^ ew ' ) < A 4 -° W ) for all pairs, 
i.e. the addition of an edge can never cause nodes to feel less close to one another. This 
is not sensible in the context of nodes that share a finite resource with its neighbors, as 
shown in Fig. 0B): if a node i has many neighbors, each receives less of the resource 
than if i had few neighbors. A quantity such as the transition probability in a random 
walk, Pi~>j, is asymmetric and ensures that nodes are closer if they have few neighbors, 
pictured in Fig. [ljB) (so a walker is more likely to pass between them than if they had 
many connections). However, it is not a global measure of closeness because the transition 
probability incorporates only the nearest neighbor connections between nodes. It is useful 
to develop a measure of closeness that incorporates these two (sometimes contradictory) 
aspects depicted in Fig. [l] nodes feel close if there are many paths between them, but 
popular nodes are less close to their neighbors than unpopular nodes. 

We have recently shown]!] that the Generalized Erdos Numbers ( Eij , or GENs), describing 
the closeness node j feels towards node i, satisfy the expected properties for the sharing of 
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finite resources described in Fig. [lj The GENs are defined as 


Wl 

Eij 


= w v + 


E 


W jl 


l£C . E a + wj 

l^i 


Eu — 0 , 


(1) 


where Cj is the set of nodes directly connected to j. This form is chosen such that the 
node i is as close as possible to itself and that if j is connected to only one node k, j’s 
closeness to i satisfies E t j = + w Jk ■ If there are multiple paths between nodes, the 

closeness j feels to i is strengthened if there is a direct connection between them but also 
includes a contribution from all other neighbors of j weighted by their connection strength. 
By choosing a harmonic mean for the form of the contribution, we bias our measure of 
closeness towards neighbors that themselves feel close to i. The GENs are defined using 
the global topology of the network, and E t j is finite even if i and j share no neighbors (as 
may not be the case for more local measures of closeness E3]). 

This definition of the GENs in Eq. [l] is nonlinear, and the exact values of E t] for com¬ 
plex networks are not easily determined analytically. Ejj can be computed numerically 
in an iterative fashion[T], with Ejj = E^°°^ and the recursive definition Wj/E^- +l ^ = 
Yi wji/[E^p -1-uF^ 1 ] (with the constraint that E^ = 0 continually enforced). In this paper, 
the iteration is halted when rnaxjj | E^ +1 ^ — E^' < e = 0.005. The method also requires 

an initial guess, E^\ with the simple initial guess throughout this paper that E^ = 1 
(the method is robust to variations in this initial value). 

Eq. [l] is only one way of satisfying the expectations shown in Fig. [ll and there is a great 

i_i i_r 

deal of functional freedom in satisfying these constraints. For example, any measure E/j of 

the form Wjg(E^' > ) = Y^k w ij 9 (E ^will satisfy the desired behavior depicted in Fig. 
[ljfor a monotonically decreasing g(x), with g{x) = x~ l in the definition of Eq. [I] Another 
alternate definition will satisfy a triangle inequality at the cost of additional computational 
complexity by replacing the direct connection strength, Wkj, with the closeness, E^j, in the 
denominator: Wj/Eij = + Yk^i w v /{Eik + Ey.j). While these alternate definitions 

may be of interest in certain contexts, we continue to use Eq. [l] throughout this paper, 
due to its simplicity and previously demonstrated successes in prediction algorithms pQ and 
community detection methods[12]. Variations in the definition of Ejj will certainly change 
the numerical values of the closeness, but the qualitative behavior of the closeness between 
nodes is expected to be robust to perturbations of the definition of the GENs. 
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2 The GENs in homogeneous networks 


While Eq. [l] is not exactly solvable for all but the simplest of network topologies, the 
general properties of the GENs can be explored for sufficiently homogeneous networks. The 
unweighted Erdos-Renyi (ER) networks have a degree distribution sharply peaked about 
the average (ki ~ ( k ), where ki is the degree of the node i in an unweighted network), and 
we expect the closeness between nodes will still be broadly distributed due to the complex 
network topology. The average closeness between nodes can be derived by assuming that 
Eij = E c (the ‘typical connected’ closeness) if i and j are connected, and the ‘typical 
disconnected’ closeness, E\j = E d , if they are not directly connected. In a unweighted 
regular network, with all nodes having the same degree ki = k, it is possible to examine 
the average closeness between connected and disconnected nodes using the GENs. For 
homogeneous degree distributions such as the ER networks, we expect an approximation 
ki ~ ( k ) to be reasonable, with fluctuations in the degree expected to have a relatively 
minor impact, particularly for high mean degree. For these homogeneous networks, we 
assume that nodes that are directly connected feel a typical closeness E c between each 
other, and another closeness E d > E c to nodes they are not. If i and j are directly 
connected they have on average (k — l ) 2 /(N — 2) neighbors in common (since both have 
exactly k edges, one of which connects to the other), and they have k 2 /(N — 2) neighbors 
in common if they are not connected. We must split the sum in Eq. [l] into two parts: a 
sum over nodes neighboring both i and j, and a sum over nodes only connected to j. This 
gives the approximate equations for an unweighted network of constant degree 


and 


A ~ ! , (AlA! _L_ + (k-i- 1AA2A_L_ 

E c ~ N-2 E c + l V N -2 J E d + l 


k k 2 1 / k 2 \ 1 

E~ d ~ N — 2 E c + 1 + [ k ~ N-2j E d + 1' 


( 2 ) 

(3) 


It is possible to solve E c exactly in terms of k, N, and the unknown E d , with 


2 + kE 2 - N 
—2 + kE d + N 


(4) 


An exact solution for E d is unwieldy, but it is possible to find a solution for E d asymptot¬ 
ically for large N and we find that 


E d ~ 




T^ + OdV- 172 ). 


(5) 


Comparing this expression to the numerical solution of the equation shows less than 1% 
deviation for N ~ 1000 and k < 300, suggesting the truncation to terms of order O(N 0 ) is 
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Figure 2: The distribution of Ejj split into cases where i and j are directly connected in 
(A,B) and not directly connected in (C,D) for Erdos-Renyi networks with N = 512 or 1024 
and ( k ) = 4 (A,C) or ( k) = 20 (B,D). Note the changing axes in all figures. The predicted 
average of E c and Ed are marked using the same color schemes as in the figures (where 
the predictions include the higher-order terms found in Appendix A). For (k) = 20, there 
is excellent agreement between the theoretical and simulated averages. For ( k) = 4, the 
GENs are far more heterogeneous than can be captured using the simple model in Eqs. 
[5] and [6j and the theoretical predictions do not agree well with the observed behavior for 
both connected and disconnected nodes. 


sufficient for large N over a wide range of k. A good approximation for E c can be found 
by setting k = kN and taking the limit of k —> 0. We find 


E c 


k + 2 


1 + 


k 3 

N(k+ 1) 


k{k+ 1) 
N 


~ k + 0{k 2 N~^). 


( 6 ) 


where latter is the scaling for sufficiently large N. Even for N ~ 1000, higher order terms 
can contribute in the series for only moderate values of k, and the full expression is required 
to obtain an accurate estimate. 

In Fig. [2j we see the distribution of the GENs for Erdos-Renyi networks with varying N and 
( k ). In (A-B) we see that changing (k) radically alters the mean values of E^ as well as the 
shape of the distributions, while changing N only marginally affects the distribution of the 


7 

































connected GENs. For (k) = 4 the distribution of E^ exhibits multiple peaks in Fig. [2^ A), 
with each local maximum corresponding to a different degree of the node j and with the 
width of the distribution about the peak coming from differing degrees of the node i. Such 
heterogeneity is less apparent for high-degree nodes (Fig. [2](B)) , where fluctuations in the 
degree of i or j have less of an impact on the GENs, and the distributions are unimodal. 
For disconnected nodes, the distributions have a single dominant peak (Fig. §C-D)), 
and the location of the peaks are well predicted for (k) = 20. Due to the heterogeneity 
and significance of degree fluctuations for the smaller (k) = 4, there are large differences 
between the predicted and observed averages. 

In contrast to the homogeneous degree distribution of the Erdos-Renyi random network 
model, Barabasi-Albert (BA) networks [8] have a scale-free, heterogeneous degree distribu¬ 
tion, and Fig. [3] shows that the distribution for the GENs for BA networks are likewise 
heterogeneous for directly connected nodes. The distribution for the GENs between nodes 
that share an edge (shown in Fig. |A -B) appear to have a relatively fat tail and ap¬ 
proximately satisfy Pr(Eij = E) ~ E~ x for nodes that share a direct connection, with 
an empirically determined scaling exponent near 1.5 for (k)= 4 and around 2.1-2.2 for 
( k ) = 20 (shown in Fig. [4|. This is in comparison to the fat-tailed degree distribution 
with the P(k) ~ /c -3 scaling of the BA networks for both values of ( k ). Interestingly, the 
distribution of the GENs for disconnected nodes does not depend as strongly on the scale- 
free nature of the degree distribution, with similar qualitative features found in both Fig. 
[2p-D for the ER networks and Fig. [3p-D for the BA networks. While the existence of hubs 
in the BA networks tends to give a higher probability of finding smaller values of E t] for 
disconnected nodes in comparison to ER networks, the most likely values of E t j are similar 
for disconnected nodes in either network topology (in contrast to the radically different 
distributions in for connected nodes). We have considered only unweighted networks in 
this analysis, and allowing weighted edges further complicates the analysis of the ‘typical’ 
GEN between nodes unless a homogeneity assumption on the distribution of weights is 
likewise made. 


3 Random Walks and the GENs 

The mean first passage time (MFPT) from node i to node j (r t j ) are of interest in many 
contexts, and because of the limited resource represented by the random walker, it is 
worthwhile to see the relationship between the rate of travel between nodes and how ‘close’ 
they are as measured by the GENs. Tetali|36] has shown that the MFPT in an unbiased 
random walk can be computed directly from the resistance distance|fl5> 16] Rij with Tij = 
\T.MRij + Rji — Ril)- R is easily seen that the MFPTs are asymmetric (jij ^ Tj*), 
in general, as it is easier to reach a high-degree node than a low degree node (much like 
the asymmetry in the GENs with Eij ^ Eji). We intuitively expect that if node j feels 










Figure 3: Distributions of the GENs for the Barabasi-Albert networks for nodes that do 
share a direct connection (A,B) and do not share a connection (C,D) for (k) = 4 (A,C) 
and (k) = 20 (B,D) and with N = 512 (blue solid lines) or 1024 (red dashed lines). The 
distributions are far smoother than those seen in Fig. [2] for small (k) due to the more 
heterogeneous degree distribution of the BA networks. The behavior is consistent with the 
ER networks: the mean values depend strongly on ( k ) and weakly on N if the nodes share 
a direct connection, while the opposite is true if the nodes are not neighbors. Because the 
fat tail of the degree distribution (with P(k) ~ k ~ 3 ) provides a broader distribution to the 
connected GENs distribution than in the ER case (Fig. [2]), the mean values are further 
from the peaks of the distribution and are not indicated in the figure. 
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Figure 4: Fitting the heavy tail of the distribution of the nearest-neighbor GENs for 
the Barabasi-Albert networks in Fig. [3] (A-B). Over a wide range of values, there is an 
approximate power law decay which is very weakly dependent on N (N = 512 in the red 
circles and N = 1024 in the blue squares) but does depend on the average connectivity 
((k) =4 shown above and (k) = 20 below). This is consistent with the weak N dependence 
seen for the Erdos-Renyi networks in Eqs. [5] and [6} 
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Figure 5: Asymmetry in the Barabasi-Albert GENs 5Eij = Eij — Ej t compared with the 
asymmetry in the MFPTs for those networks, 5rtj = Tij — Tji. In these density plots, darker 
colors correspond to a greater observed frequency of the same (SE, 5t ) pair. Shown are 
two values of N = 512 and 1024, as well as two values of (k) = 4 and 20. It is visually clear 
that the asymmetry of the GENs are highly correlated to the asymmetry in the MFPT 
(with the slope of the best fit line indicated). 
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Figure 6: The harmonic mean of the infection time of node j with a single initially infected 
node i, hij in an Erdos-Renyi network. The SIR model is diagrammed in (A). The main 
panels of (B-D) compared hij with the GENs E 3 , while a comparison with the MFPT in 
a random walk Tjj is shown in the inset, with different colors denoting a different initially 
infected node i. In all, hij is scaled by the average infection time of the nodes, Hi, and 
likewise the MFPT is scaled in terms of T) = (N — l ) -1 Yli^k Rfc• (B) shows N = 512 nodes 
with tr = 0.0025 Rj. (C) N = 512 and tr = 0.5r/. (D) N = 1024 and vr = 0.0025r/, all 
with (k) = 4. A functional relationship between hij/Hi and Eij is immediately apparent in 
(B-D), which corresponds well to a simple power law for vr -C 77 , while the insets show that 
the MFPT is a poor predictor of average infection time. The dashed lines show the best 
power law fit for the data, with the scaling indicated in the figure. The functional behavior 
is not universal, depending on the size of the network and the SIR model parameters. (E) 
shows the standard deviation of the residuals as a function of rji/rj for a power law fit 
hj/Hi ~ f[xij ) for Xij = (red), Rij (purple), and t 13 /T % (blue). 


12 











































‘close’ to node i (small ) but node j does not (large Eji ), a random walker will pass 
more readily from j to i than from i to j. We therefore compare 5Eij = Eij — Eji to 
the difference in the MFPT between nodes 8rij = r t] — Tji in Fig. [5] and empirically find 
the asymmetry in the MFPT is highly correlated with the asymmetry in the GENs, with 
an empirical scaling of ~ — 5Eji V aN with N the number of nodes in the network 
and a~4a topology-dependent constant. The fact that 8rij oc 5 Eji, even when there 
are no direct connections between i and j, indicates that the GENs are able to capture 
the importance of the global network topology even for distant nodes. It is interesting to 
note that while the relative proportionality appears to be strongly dependent on the size 
of the network and only weakly dependent on the average degree, it is visually apparent 
that (k) sets the scale of the fluctuation statistics from the best fit line. For (k) = 4, the 
asymmetry density is relatively disperse (Fig. [5]( A) and (C)) with no obvious structure, 
while for ( k ) = 20 there is clear clustering of the density about curved bands (Fig. §B) 
and (D)). 

A similar result, using the same values of N and (k) for a Erdos-Renyi network, is shown in 
Fig.0 and displays the same qualitative features: the asymmetry in the GENs are highly 
correlated with the asymmetry in the mean first passage times, and there is an apparent 
scaling of 5Eij ~ STij /V cherN with a ~ 4 — 7 depending on the average degree. This 
coefficient is on the order of the scaling coefficient for the BA networks in Fig. [5j with 
chba ~ 4. While the structure of the deviation from the best fit line is interesting, note that 
the logarithmic scale of the coloring means that these extreme outliers are still relatively 
rare compared to the much higher density near the best fit line. The overall correlation 
between the asymmetries of the GENs and the MFPTs indicates that 5E is a meaningful 
quantity, capturing the topological details of the network. We have found that our iterative 
method for computing the GENs (implemented in C++) converges far more rapidly than 
directly computing the MFPT via a matrix pseudo-inversion [T5j for the networks we have 
considered (using the ‘pinv’ function in Matlab). 


4 An application to epidemic spreading 

The spreading of an epidemic has been studied by many authors and in a wide range of 
contexts [32 ESI ES GEIEI, with the SIR model being one of the simplest and most com¬ 
monly used. The susceptible-infected-recovered (SIR) model assumes that a population 
of susceptible individuals becomes infected due to interactions with previously infected 
individuals, and infected individuals may recover and become non-infectious. A simple 
schematic of the SIR model is shown in Fig. [6]( A), with infections occurring at a constant 
rate rate, rj, when individuals interact and the recovery at constant rate, tr. A number of 
more complex models have been considered extensively for a homogeneously mixed popu¬ 
lation of individuals [39], but non-uniform interactions between individuals, represented by 
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Figure 7: Asymmetry in the Erdos-Renyi GENs A Eij = E^j — Eji compared with the 
asymmetry in the MFPTs for those networks, Ar^ = Tji — Tij. The figure labels are 
identical to those in Fig. [5j with a similar behavior in scaling and variation from the best 
fit line. 
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networks, can have a profound impact on the dynamics of epidemic spreading in the SIR 
model [EUI3 HE], The existence of epidemic thresholds (5j 4Dj for homogeneous networks (or 
the lack thereof for scale free networks[T7j) are well-studied global quantities of interest, 
while more local quantities such as the probability of a particular node i becoming infected, 
sparking an epidemic [Hj, and quarantine or immunization strategies [38 ! , i22j have also been 
examined. 

While it is clearly useful to understand the global properties of the epidemic (such as the 
expected number of infected individuals), a particular individual j may also be interested 
in its own probability of becoming infected given the current state of the disease and 
may reasonably be less concerned if no neighbors are infected than if many neighbors 
are infected. However, it is not straightforward to analytically calculate how long the 
disease will take to reach j from any point in the network, and it would be useful to 
have a measure for how ‘close’ the epidemic is from an individual node. If the infection 
begins with a single node i, we expect that the disease will more rapidly propagate to 
nodes for which i feels topologically close, and it is therefore worthwhile to compare the 
infection times in a SIR epidemic with the GENs as a proxy for closeness. To see the 
relationship between infection time and closeness, we simulate an SIR epidemic, using 
Gillespie dynamics [33] ° n an Erdos-Renyi graph (with a uniform probability of connection 
and each node having (k) = 4 average degree). This allows us to determine the harmonic 
mean of the infection time of a node j given an initial infection at i over the K simulations, 
h ^ 3 1 = [tS ,]- 1 (the harmonic mean is chosen to avoid diverging infection times in 

simulation k, where the disease dies out before j is infected). Because the GENs do 
not naturally include the timescales of the system dynamics (77 and r r) we normalize the 
harmonic mean by Hi = j- J/j-A,; hij , which is the ‘typical’ time at which a node becomes 
infected if the disease originates with i. 

In Fig. [b](B-D) we compare hij/Hi to Eji (how close node i feels towards node j ) for a 
network of 512 nodes and tr = 0.0025ry in (B); N = 1024 and r# = 0.0025 77 in (C); 
and N = 512 and tr = 0.5 77 in (D). In all cases, hij/Hi tends to increase with Eji (as is 
expected: nodes i feels close to are infected faster), although the functional dependence of 
the infection time on the topological closeness depends on the parameters in a non-universal 
way. We also show in the insets that the scaled mean first passage time between nodes 
in a random walk ( 77 /Tj) does a surprisingly poor job of predicting the infection time in 
comparison with the GENs. This is likely due to the significant differences between the 
SIR model and a random walk: if a walker departs from node i at timestep t, he cannot 
depart from i again at time t + 1. In effect, there is only one ‘infectious’ node at a time 
in a random walk which makes the MFPT a poor estimator of the infection time in a SIR 
simulation. The resistance distance between nodes Rij provides a fit quality similar to that 
of the scaled MFPT (data not shown). In Fig. [ 6 ](E), the quality of the fit is quantified 
using the standard deviation of the residuals of the fit. In all cases, the GENs are much 
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more highly correlated with the average time of infection between pairs of nodes in the 
network. Fig. [6] indicates that the GENs Eij can more meaningfully capture the impact 
of network topology on the dynamics of epidemic spreading than other global measures of 
pairwise ‘closeness’ between nodes. 


5 Measuring a personalized importance 

The GENs incorporate a simple idea of what is meant by the ‘closeness’ between nodes 
in a network where limited resources are shared, and we expect that a node j that feels 
‘close’ to node i (having small Eij ) considers node i to be ‘important’ in some sense. 
We may therefore regard the inverse of the closeness between nodes (ipij = E^ 1 ) as an 
un-normalized personalized measure of importance, allowing a ranking of all nodes in the 
network from the perspective of the node j. Because ipij measures the importance of i from 
a particular node j (rather than the network at large), it is not equivalent to a centrality 
measure. 

To gain insight into the meaning of the personalized importance, Eq. [I] can be expanded 
in the limit of large E^ for i ^ j {a valid expansion for ki = k shown in the SI, and 
is observed to be reasonable for more complex networks), yielding ET 1 ~ W~ 1 [wfj + 
w ijEjj 1 ] + 0(E^j 2 ). We can use this lowest-order expansion to define the linearized 
importance assigned by node j towards node i (ipij) as 



where ipa is undefined (since Ea = 0). Eq. [T] provides a natural interpretation of the 
meaning of personalized importance. The importance j assigns to i is a combination 
of two terms: a weighted average of the importance his neighbors assign to i, and an 
importance of Wij assigned in the case of a direct connection between i and j. Defining 
(jf)ij = ifiij and (L)jj = Wjbij — the graph Laplacian|31, 03} 15), Eq. [7] can be 
written = wf - — Wijtpij, relating the linearized importance directly to the graph 

Laplacian. Despite the simplicity of this interpretation of the linearized importance, ipij 
is of limited use in unweighted networks (diagrammed in Fig. [8j if = {0,1}, Eq. [T] 
reduces to kj x with the solution ipij = 1 for all connected nodes. Only 

in the context of weighted networks is a non-trivial measure of importance obtained. The 
full nonlinear expression for the GENs (which incorporates higher order terms than the 
linear approximation) does not suffer from this difficulty and can still determine meaningful 
measures of importance between nodes. 

The usefulness of the nonlinear importance i/jij = E^j 1 on a network can rapidly deter¬ 
mine meaningful relationships between nodes in complex networks. To illustrate this, 
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Figure 8: In this simple network, ip c i, = (p ca , meaning the importance node b assigns to 
c (with no direct connection between them) is identical to the importance a assigns to c 
(which share a direct connection). Intuitively, we would expect the lack of direct connection 
to imply b feels c is less important (due to the lack of direct connection w\, c = 0). This 
condition is satisfied using the nonlinear importance, with < ip ca . 


we consider the benchmark of Liancichinetti, Fortunato, and Radicchi (LFR)pT|, which 
constructs a network of communities of variable sizes n (distributed as P(n) oc n ~^), a 
scale-free distribution of the nodes (with P(k) oc fc -7 ), and which is characterized by 
the mixing parameter, /i, as the fraction of inter-community edges. We have previously 
shown[12] that the GENs can be used to detect the community structure underlying this 
benchmark. When measuring the importance of a node, a global measure of centrality will 
generally focus on nodes with high degree, but due to the heterogeneous density of edges 
between communities, we expect a meaningful definition of the importance j assigns to i to 
differ significantly depending on if i and j are in the same community. Note that the deter¬ 
mination of the GENs does not require a priori knowledge of the community structure. In 
Fig. §A), we determine the distribution of importance ipij between nodes i and j that do 
not share a direct connection (wij = 0) for nodes within V s community (red) and outside 
of i’s community (blue). There is an immediately apparent difference in the distributions, 
with a greater probability of a high importance if i and j are in the same community due 
to the increased number of shared neighbors (even in the absence of a direct connection). 
Increasing the LFR parameter /r (which increases the number of edges between communi¬ 
ties) reduces the difference in the distributions, but varying the other system parameters 
has only a minor impact on the clear distinction between the two distributions (data not 
shown). A large overlap between neighbors is not required to accurately detect meaningful 
differences in the importance between pairs of nodes, though: in Fig. [9](B) we show cliques 
of n nodes with A, B , and C having no neighbors in common, but due to the network 
topology, A is closer to B than to C (depicted in the inset of Fig. [9](B)) . The GENs deter¬ 
mine that B feels A is more important than C does. V’ij incorporates direct connections 
between nodes, proximity in the network, and shared neighbors into a meaningful measure 
of personalized importance. 
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Figure 9: (A) The GENs applied to the LFR benchmark[llj. The red shows the distribution 
of importance for nodes i and j that are in the same community but do not share a direct 
connection. The blue shows the distribution for those in different communities (and still 
sharing no direct connection). Due to the high density of links inside of the communities, 
the GENs accurately indicate that i/’ij is likely to be larger if i and j are in the same 
community. (B) In the inset, we diagram a simple network topology where cliques are 
in greater proximity to one another but still have no neighbors in common. Each circle 
represents a clique of n nodes (fully connected with each edge having unit strength) and 
each edge an all-to-all connection between cliques with each edge having weight w. Clique 
B is in close proximity to auxiliary nodes with integer distance 2 from A (highlighted 
in yellow) while clique C is in closer proximity to auxiliary nodes of integer distance 3 
(highlighted in green). In the main figure, ipij indicates that A is more important from the 
perspective of B than from C for varying values of n and w due to the closer proximity of 
B’s neighbors to A than C’s neighbors, with iPab/^AC > 1- 
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Degree [45] Node Degree 

PageRank[lj5j i th component of first eigenvector of B 
bi Betweenness [TB] times i is visited in a random walk 

_Centrality oc E jec, Ei,m& \ R a + R j m Rim Rj 1 1 • 

c% Random Walk m The rate information flows from other 
Centrality_nodes to i: c" 1 — c~ l = Tjj — Tjj 

Table 1: Some common methods of measuring the importance of a node i. These are 
compared directly to the Erdos Centrality 'I',; = Ezect E^ 1 in Fig. 4 of the main text. This 
list is not exhaustive, and other measures of centrality are possible|45J .45]. Rij denotes the 
resistance distance[T51 l(ij between nodes i and j. Tij the mean first passage time between 
i and j in a random walk, and By = (1 — d)wij/Wj + d/N is the matrix of transition 
probabilities for PageRank (where d is the teleport probability of visiting a non-neighbor 
in PageRank). Throughout the paper, each measure of centrality is normalized such that 
the sum of all centralities in the network is 1. 


6 Global importance and Erdos centrality 


Having defined a pairwise measure of the importance a node j assigns to i using 'ijjij (or 
the linearized ipij), we naturally expect that we can leverage this definition into a global 
measure of the importance of node i. There already exist a wide variety of methods for 
measuring centrality from a global perspective, including degree [451 151 45] , PageRank [45] 
19], random walk[14]. and betweenness cent,ralitv [141 H5] (briefly described in Table [I]). 
Each measure ranks high-degree nodes above low-degree nodes but take the global network 
topology into account in different ways. These methods produce qualitatively similar but 
quantitively different node rankings, as reflected by the fractional intersection between the 
top-n orderings[47]. crxY(n) = ^ Efc \°x(k) D oy(k)\/k, with o x{k) the top-fc ordering 
using method X. oxy (n) is shown comparing the ordering due to PageRank with that 
found using the betweenness and random walk centrality measures for a Barabasi-Albert 
network topology in Fig. [TTT|f A). It is clear that there is an immediate drop to a > 0.95 
for small n for these well known centralities (i.e. good but not perfect agreement on the 
top few nodes), after which the methods tend to vary slowly or not at all for larger top-ra 
lists. 


To convert our personalized importance measures into a single global measure, we define 
= Yliec 'fin as ^ ie sum °f the importance the neighbors of i assign to it (akin to the 
approach of Ref. [32]). For an unweighted network (with = 1 for all i and j). the 
linearized = J2jeCi Ai = ki is equivalent to the degree centrality. In Fig. 10 [B), we 
compare 'Iq to the other measures of centrality in Table 1 for a single realization of a 
Barabasi Albert network with N = 512 and (k) = 20. In all cases, there is an obvious 
correlation between these measures of centrality but with significant difference between 
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Figure 10: Centrality for a Barabasi-Albert network with (k) = 20. (A) uses the intersec¬ 
tion metric crxyiji) to estimate the similarity in top-n lists between PageRank and two 
centrality measures listed in Table 1 in the Supplementary information. While there is 
good correspondence between these measures, each measure gives a slightly different or¬ 
dering for small n. (B) The Erdos centrality (x-axis) compared to the 4 common centrality 
measures (y-axis) shows an obvious positive correlation overall. Circles shows degree cen¬ 
trality, squares PageRank, diamonds betweenness centrality and triangles random walk 
centrality. The clustering of some data near discrete values is due to the heterogeneity of 


with (k) = 4). (C) shows the intersection metric between the top n elements of the Erdos 
centrality (o e(p)) and the top n elements of the other centrality measures for varying n. 
The GENs are about as consistent with known measures of centrality as these measures 
are amongst themselves (shown in (A)). 


the Erdos centrality for nodes of equal degree (this effect is more pronounced in Fig. 11 
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centrality measures in some cases for both central and non-central nodes alike. Degree 
and random walk centrality both tend to be strongly clustered around desecrate values, 
leading to the banded structures seen in Fig. m; B ) , and the difference between these and 
PageRank seen in Fig. ©A). The Erdos centrality takes the global topology into account 
differently than these measures of centrality (similar to the betweenness and PageRank 
centrality). When comparing the Erdos centrality to those in Table. 1, the similarity 
between the orderings from most- to least-important [381 l49j . using different measures is of 
particular interest pOl 121 j . We compare the Erdos centrality ordering to the other measures 
of centrality using axy {n) m in Fig. [TopC) and see that the Erdos centrality is consistent 
with other measures of centrality: a sharp drop initially with slow variation for larger n and 
<r(n) <1 0.95 throughout. Despite the different formulations between the Erdos centrality 
and PageRank, the top-n list for Tj compares best to the list from Pri (dashed purple 
line) for high degree nodes but begins to agree better with betweenness centrality when 
less central nodes are also included. 


We have also computed the Erdos centrality and other measures of centrality for a BA 
network with N = 512 and (k) = 4. While the discreteness of the degree distribution is 
still apparent in Fig. [To) 3 (with the tight clustering of points about discrete values), it 
is has a greater impact in Fig. EO where k attains fewer values due to the significantly 
smaller average degree. The general trends are qualitatively similar to that in Fig. 10 


though: the betweenness centrality (Fig. [TT|C)) depends on the degree in a qualitatively 
different manner than any other measure (and is more clearly correlated to the Erdos 
centrality for low-ranked nodes). The Erdos centrality does not appear to be linearly 
related to the random walk centrality (although they are still correlated), and clear bands 
are observed due to nodes of equal degree for both the degree and PageRank measures of 
centrality. 

It is interesting to note that ipij can define a modified connectivity matrix, incorporating 
both direct connections in the adjacency matrix and non-local paths between nodes. A 
random walk performed with a transition probability BE = ipij / tpij (with the con¬ 
vention BE = 0, meaning the walker never remains at i) has a similar interpretation: a 
walker at node j has a relatively high probability of moving to node i if they are directly 
connected in the original network (if Wij > 0), but has a non-zero probability of jumping to 
a disconnected node. The matrix B' can be compared to the PageRank transition proba¬ 
bility matrix B, which has a uniform probability of teleporting to any node in the network 
(regardless of the network topology). B' has a jump probability that is not purely random 
but rather preferentially lands on nodes that j finds important, in contrast to PageRank’s 
uniform teleport probability. An alternate Erdos measure of centrality can be defined as 
the largest eigenvector of the matrix B' (akin to PageRank, the steady state probability of 
being found at a node i under the transition probability B). However, the computational 
efficiency of the simple sum Tj = YljeCi and its clear correlation with known measures 


of centrality (seen in Figs. 10 and 11) makes our current definition of the Erdos centrality 
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Figure 11: Comparing the Erdos centrality with other measures of centrality for Barabasi- 
Albert networks with a lower average degree of (k) = 4. The symbol color and shape are 
the same as in Fig. 10 There is pronounced clustering in the data due to the strong 


dependence the degree and PageRank measures of centrality have on the node degree, and 
the Erdos centrality will sometimes label a node as more important than another node 
with a somewhat higher degree. The overall rankings are still consistent, with high degree 
nodes being typically more important. 
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Figure 12: Schematic diagram of the continuum model with long range connections. (A) 
shows the reduction of the continuum to a network of material nodes, with the weight 
between them dependent only on their distance, iu(x, x 7 ) = e - A l x ~ x l/ a . (B) shows the 
embedding of long range connections between network nodes, with weight specified by the 
network rather than depending on the distance between the nodes. 


preferable. 

Because the GENs can be used to describe random walks efficiently while prescribing a 
topology-dependent teleportation via the matrix B 7 (with BE = ipij / V ’ij discussed in 

Sec. [6]), they may be of use in future studies as well. Walkers on B 7 will tend to remain 
trapped in regions of locally dense edges, and implementing such a walk in community 
detection methods may increase the accuracy of the determined partition|50|. Similarly, 
the modified walk may be of use in modeling systems where jumps to non-neighboring 
(but topologically close) nodes is desirable, such as epidemic spreading in a social network. 
Epidemics can be been modeled|18] as spreading with some probability to direct contacts 
(due to the assumed high probability they interact directly) and with a lower probability 
‘jumping’ to nodes without a direct contact (with non-neighbor transmission possible due 
to rare transient interactions). We could expect that non-neighbor interactions will not 
have a uniform probability, but rather that each node is more likely to transiently interact 
with friends-of-friends (or individuals s/he feels topologically close to). Epidemic spreading 
on B 7 will capture our expectation that transmission to friends with whom we have regular 
contact is most likely, but transmission to friends of friends is more likely than transmission 
to very distant people in the social network. 


7 GENs in the continuum limit 

We showed in the previous sections that the Generalized Erdos Numbers can be useful 
in describing complex networks a variety of contexts by incorporating the global network 
topology into a measure of closeness between nodes. Many physical networks exist in a 
metric spacej3], such as wireless sensor networks |24L 125] or transnortation [26l 127] networks, 
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where the strength of the direct interaction between two nodes depends on their locations. 
In many contexts, such as transportation networks[6J, distance-independent connection 
strengths may coexist with geometrically defined links (for example, surface mobility versus 
air travel). It is natural to wonder what impact this may have on the GENs, and how 
topological closeness is influenced by physical proximity. We assume that a network of long- 
range connections has been externally defined, and that each node in that network can be 
assigned to a location in some d dimensional space (a schematic is shown in Fig. [T2]A) . Each 
node i in the original, distance-independent network (of N nodes) occupies a location yj, 
with an interaction strength Wij. We refer to these nodes as ‘network nodes’, each of which 
has at least one direct connection due to the connectivity of the original network (referred 
to as a long-range connection). To introduce geometry into the problem, we also define a 
network of N* ‘material nodes’ into the network at locations {x^}, whose interactions are 
purely geometric (schematically diagrammed in Fig. |~i~2) 1 for a spherical geometry). The 
distance-dependent interaction between the nodes of any type (both network and material 
nodes) at locations z and zl is given by u( z, z'), with the constraint that u( z, z') = u(z', z) 
to ensure the network remains undirected. Note that distance-dependent connections are 
drawn between network nodes as well, so the total weight between y* and y j is + 
u(yi,yj). If N* = 0 or u(x,x') = 0 the physical geometry is irrelevant and the original 
network is recovered, while if N = 0 or = 0, then the weights between nodes are 
determined only by node locations. 

Replacing discrete equations with continuum models has been of use in a wide variety of 
systems [51]. and for networks with large N*, we can establish a continuum limit of the 
definition of the GENs in Eq. 1 of the main text. In SI sec. A, we develop a linearized 
continuum model for the approximate pairwise importance </?(x, x') and describe a more 
complete (and complex) nonlinear continuum model in SI sec. D. An inhomogeneous 
Fredholm equation is found for the purely geometric network (where the number of long- 
range network nodes N = 0 or equivalently = 0), with 

¥>( x , x ') = ^ / dZ Z ^ X ’ ( p ( z ) “ “ z )) ’ ( 8 ) 

where p( z) is the local density of material nodes at z and U(x) = f dz p(z)u(x, z) is the 
strength the material node at x. The delta function removes the (undefined) factor of 
t/?(x,x) (necessary because of the discrete requirement that Ea = 0), and is equivalent to 
the removal of a self-energy. Note that if u(x,x') ^ constant, the linearized importance 
will yield a non-trivial result (tp(x,x') ^ constant). It is straightforward to show that the 
solution to the Eq. [8] is <p(x, x 1 ) = <p(x,x') — (p(x,x), where U(x')ip(x,x') = u 2 (x,x’) + 
f dzp(z)u(x', z)<p(x, z) is the importance without removal of the self-energy term. There 
are a wide range of methods to determine (p(x,x') analytically [52l 1531 . but determining an 
exact expression is a non-trivial task even for simple geometries (see SI Sec C). 

If we combine short ranged geometrical interactions with long range, geometry-independent 
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Figure 13: (A) shows the importance that a node at x assigns to a node at one of the 
poles of the sphere x = (R , 0,0) for all nodes in a sphere of radius R = 5a with a weight 
it(x, x 7 ) = e _ l x ~ x l/ a and no long-range connections. The importance assigned by a node at 
x 7 towards the reference point at x decreases with an empirically determined scaling which 
is not expected to be universal. (B) shows the importance of the origin (x = (0,0,0)) as 
a function of the distance |x'| and decreases more slowly than the importance assigned 
to a node at the pole due to x’s central location in the network. (C) shows the global 
centrality of each node 'k(x) as a function of its distance from the origin. Sharp variations 
for particular values of 'F(x) are due to nodes with a differing local connectivity but an 
identical distance from the origin (e.g. |x| = 5a for x = (0, 0, 5) and for x = (0,4,3)). The 
overall decrease in 'F(x) as a function of |x| is readily apparent, with the origin being most 
important as expected. 
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Figure 14: The asymmetry 5E(x,x') = E(x., x.') — E(x', x) for a sphere of radius R = 5a 
(with no long-range interactions) as a function of the distance between the nodes in the 
absence of any long range connections. Deeper red points correspond to x 7 near the origin 
and blue points correspond to x 7 near the boundary of the sphere, so nodes on the periphery 
feel the central nodes are more important than vice versa. The relationship between SEij 
and 5rij in Fig. 2 of the main text suggests that random walks reach the center of the 
sphere more readily than a specified point on the surface. 
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edges, the expression for the GENs and the linearized importance ip becomes more complex. 
Eq. [8] is valid only in the absence of long range connections, but we show in SI Sec. B that 
the linearized can still be written in the simple form 


<p(x,x') = </?o(x,x') + / dzK 1 (x,x',z)(p(x,z), 
Jfi 


(9) 


with the complicated functional forms of </?o( x i x/ ) the kernel AT(x, x 7 ,z) explicitly given 
in SI Sec B. While Eq. [9] is linear (and therefore analytically tractable), all methods for 
determining <p(x, x') will require integration of functions involving which we show in 
SI Sec. C is difficult even in simple domains and for simple interactions u(x, x 7 ). Eq. [8]an d@ 
can still be solved numerically over finite domains using well known methods |52j, [53]J in cases 
where exact results can not be obtained. The global importance, <h(x), is straightforwardly 
computed from Eq. fusing 4>(x) = f dzp(z)p(x, z) (assuming u(x, x 7 ) > 0 for all |x — x 7 | < 
oo). 

While Eq. [9j describing the continuum linearized importance, is analytically approachable, 
the full nonlinear theory for the continuum system is entirely intractable. Because the un¬ 
desirable aspects of the linearized importance shown in Fig. [8] are expected to persist even 
in the continuum limit, we expect the nonlinear theory to be required to yield meaningful 
estimates of personalized closeness or global centrality. To understand the effect of the 
material interactions, we determine the GENs for a sphere of radius R formed from a lat¬ 
tice of equally spaced material nodes with an inter-node spacing a (i.e. with a constant 
node density p ). It is straightforward to simply solve for the GENs in Eq. 1 of the main 
text numerically for any geometry, which equivalent to solving the integral equation via 
quadrature[52]. For R = 5a, the N + N* = 515 nodes in the network are connected with 
a strength u(x, x 7 ) = e - A l x ~ x \/ a (with A = 1 chosen here). For a purely material sphere 
(where the interaction strength is determined entirely by u(x, x 7 ) and the number of net¬ 
work nodes is N = 0), the closeness between two points is determined primarily by their 
relative distance, as shown in Fig. [I3)(A-B). While the expected decrease in the importance 
of x from the perspective of another node at x 7 for increasing |x — x 7 | is observed, the qual¬ 
itative behavior of the importance depends strongly on the locations of both points, with 
the empirically derived scaling of ^( x ,x 7 ) ~ log^ x ^(|x — x 7 |/a + l)+const. We also find 
that nodes towards the center of the sphere have a higher global importance 4>(x) than 
those towards the boundary (Fig. [I3|(C)), due to the greater number of paths towards 
the center than along the surface. We likewise expect the asymmetry in the closeness, 
5E(x,x') = E(x,x') — A(x 7 ,x), is skewed such that more external nodes feel closer to 
internal nodes than vice versa, due to the greater overall importance of the central nodes. 
This is confirmed in Fig. 14, where 5E(x , x 7 ) < 0 is far more likely to be seen at the bound¬ 
ary of the sphere, regardless of the distance between the nodes. Coupled with Fig. 2 of 
the main text we expect that St(x, x 7 ) ~ —5E(x, x 7 ) x V aN, for some topology-dependent 
a, will accurately predict the asymmetry in the mean first passage time of a random walk 
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between nodes in the sphere. 


(A) 




Hub at Origin, w = 5 u(0) 



Direct Connections, w = u(0) 


No Long-Range Connections 


Figure 15: The importance from a node located in the x — y plane felt towards the node at 
x = (R, 0,0). A schematic of the network topology is shown on top, and the importance 
^>(x, x 7 ) for all points in the x — y plane are shown below, with red corresponding to a higher 
importance. (A) shows the effect of long-range connections between the poles and the center 
of the sphere (with each long-range edge having weight Wij = 5) for all edges relative to 
x = (R, 0, 0) (marked with the blue sphere). While each pole feels somewhat closer to 
nearby nodes than it does to distant nodes, it is still clear that the poles are strongly 
connected. (B) shows a network with the same strength of all nodes but the origin and a 
differing topology of direct connections between poles with weight = 1. Surprisingly, 
the importance assigned between poles is lower in (B) than (A) (i.e. stronger indirect 
connections produce a greater pole-to-pole importance than weaker direct connections), 
and the range of importance due to the distance-dependence connections is increased. (C) 


shows the material-only case with N = 0 (shown in Fig. 13 (A) as well). 


The influence of long-range network topology greatly increases the complexity of the prob¬ 
lem and obscures our understanding of the closeness between nodes in the sphere. To 
study the influence of the topology of the long range connections, we consider two very 
simple networks embedded in our sphere: long range connections between the poles of the 
sphere passing through a central hub, and direct connections between the poles (pictured 
in Fig. 15). In Fig. [HA), we show the network nodes at the poles are indirectly linked to 
each other through strong connections to the center of the sphere of strength = 5 (so 
W = \ J2ij w ij = 30). Also shown is a cross-section of the sphere, showing the closeness 
between points to one of the poles (marked with the blue dot in the sphere above). The 
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total strength of the distance-dependent interactions is U(x) = ^ ( u(x, xj) ~ 7.6 at the 
poles, so the contribution of the material nodes is comparable to that of the long-range 
connections. Unsurprisingly, the center of the sphere feels each of the poles are impor¬ 
tant (the central red region in the cross-section in Fig. [T5|( A)), as do each of the other 
network-node poles. We compare the central-hub topology in Fig. [T5]( A) to that of di¬ 
rect connections between poles, depicted in Fig. [I5)(B). To keep the strength of the poles 
constant, the long range connections all have weight Wij = 1 (but W = 15). Despite 
the constant total strength of the poles, the increased number of edges per pole decreases 
the importance each pole assigns to the other, reflected in the fact that the pole-to-pole 
importance, ^>(x, x'), in (B) is reduced by about 5% relative to (A). Indirect connections 
between the poles can therefore lead to a greater importance between them than direct 
connections due to the effect of short-ranged interactions with material nodes. The decay 
length of the importance assigned by material nodes to the network nodes is also increased 
for the direct connections in (B) relative to the range in (A) (not shown), suggesting that 
the reduced importance between network nodes is due to an increased importance of the 
material nodes. In situations where finite resources are shared (e.g. a random walk or 
diffusion of information), direct connections to a central hub will more easily allow the 
resource to be targeted to specific sites in comparison to multiple direct connections. 


8 Conclusions 

In this paper, we have shown the utility of the Generalized Erdos Numbers in a variety of 
contexts on both a local and a global level. The asymmetry of the GENs is highly correlated 
to the asymmetry in the MFPT, another asymmetric quantity of use in network science. 
The ability of the GENs to generate a global measure of closeness between nodes in a 
network has been shown in the context of defining both a global and personalized measure 
of the importance or centrality. Our global Erdos centrality is consistent with other well 
known centrality measures with the added benefit of a greater heterogeneity for nodes of 
equal degree in many cases (see Fig. ErdosCentrality.fig). The personalized importance 
ifiij from which the Erdos centrality is derived is shown to be useful in other contexts, with 
a specific example studying the influence specific locations of a sphere can have on other 
locations. Having demonstrated the usefulness of the GENs in many contexts here, future 
work will apply these methods to concrete problems involving network growth or dynamics 
on networks in more concrete cases of physical or biological importance. 
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A Formulation of the linearized continuum model 


In order to illustrate the emergence of both the node density and the delta function con¬ 
straints in the continuum limit (Eq. [8] of the main text), it is useful to begin by calculating 
U(x) in the case where network nodes are not included (or only between material nodes). 
We presume that there are a sufficiently large number of material nodes N* so that we can 
replace the sums over all nodes (such as in Eq. [I] of the main text) with an integral. The 
interaction between material the nodes i and j is u(xi,Xj). For the discrete system, the 
total strength of the material node j is given by 


u, = Y. 

*74 ? 


E- 

i 


u[Xi,Xj) = y u(xj,Xj) - u(xj,Xj), 


( 10 ) 


where the second term imposes the constraint of no self-interactions (no loops) in the 
network. This constraint can be avoided by allowing self-loops and will have no impact on 
the continuum model for the linearized importance, but such a change would have an minor 
impact on the non-linear theory. Because the GENs were originally developed for networks 
without self-interactions, we retain that assumption in this paper. If each material node 
were on a lattice with spacing a, the volume excluded by each node would be a d . We could 
then write 


c. = £ 


a d u(x l ,x j ) 


- lt(Xj, Xj) 


dz pu( z, Xj) 


1 — 6(x — z) 


= U(xj) - ufx^Xj) ( 11 ) 


in the limit of N* —> oo and a — > 0, where p = l/a d is the constant number density of 
the nodes at the location z and with the constraint pV = N*, where V is the volume of 
the domain and U(x) is the total strength with the integral over all space (i.e. without 
neglecting the self-loop). We have kept the continuum version of U(x) to be the integral 
over all space for convenience, but must remember to remove the contribution from the self 
For material nodes that are not on a lattice, we can replace the effective 

and defining the variable density, 


11 


loop as in Eq. 

volume a d with the volume excluded by the node j v(x t 


p(x) = \/v(x), produces the strength 

v(x j )u(x i ,x j ) 


U(x) = 


£ 

j 


v(Xj) 


dzp(z)u(z, x) 


( 12 ) 
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where again Uj —> U(xj) — u(xj,xj). A similar limit will be found for any other sum that 
must avoid self-loops in the continuum limit, such as in Eq. [l]of the main text. 


B Appendix: Including long-range connections in the con¬ 
tinuum model 

Labeling (x, y) as the importance a network node at y (connected to a long range 

edge) assigns to a material node at x (with no long range connection), the linearized 
importance between material nodes reduces to the coupled equations 


U(x')ip(x,x') = ii 2 (x, x 7 ) + J dz u(x , z)p(x, z) p(z) — S(x — z) 

+ ^n( x/ ,y fc )^ (n ^ m) (x,y fc ) - <^(x,y fe )^ 


and 


U(yi) + W k W n_mi) (x,y fe ) = tt 2 (x,y fc ) + / dz u(y k ,z)(p(x,z) 


p(z) - S(x-z) 


(13) 


(14) 


+ + “( yfc , yi )]^ ( "^ m) ( x 5 y «) - u(y k ,yi)<p(x, y t ), 


where Eq. 13 describes the importance between material nodes, and Eq. 15 describes the 
importance from the N network nodes to the material nodes. Eq. 13 is identical to Eq. [8]of 
the main text with a set of additional 5-function constraints at the location of the network 
nodes (where long-range connections contribute). Note that if Wij = 0, the substitution of 
) (x, yi) = <£>(x, y i) into Eq. |l5| recovers Eq. [blj which in turn reduces to Eq. [8] of the 
main text as expected. The discrete nature of the network nodes allows a direct solution in 
terms of <p(x,x'), and it is convenient to define M = (L^ + Lo)^ 1 , with (Lq)^ = Wrfij — w^j 
the un-normalized graph Laplacian of the network nodes (in the absence of any material 
nodes) and (L^)^- = U(yi)5ij — u(yi,yj) incorporates the effects of the material nodes 
on the strength of the network nodes. It is important to note that + Lo is not the 
graph Laplacian of the entire network, and can be inverted so long as u(z,z') > 0. It is 
straightforward to derive the solution for <yj( n_HTl ) in terms of ip, with 


[(L u + L 0 )^ (n ^ m) (x)] fc = u 2 (x,yi) + / dzLp{x,z)u(y u z) 


p(z) — 5(x — z) 


(15) 


^2p(x,ym)u(yi,y m ), 
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where [^( n_ * m )(x)]fc = <^( n -> m )(x, x/%). Eq. 15 can be solved directly, and we find and 
substitution into Eq. [13] yields 

U(x')(p(x,x') = u 2 (x,x 7 ) +^u(x / ,y fc )M M u 2 (x,y i ) - ^ u(x', y*)^(x, y z ) 


kl 


+ / dzip(x, z)[p(z) - <5(x — z)] 


(x, z) + X u(x, y fc )M fcZ u(yi, z) 


kl 


X «(x', yfc) M ^ XI ^( x ’ ym)u{yi,ym) 


(16) 


kl 


where M = (L u + Lq) _ 1 . Eq. 16 is linear in y?(x, x 7 ), meaning that it can be solved exactly 
knowing the locations of and interactions between network nodes. In particular, we can 
write 


with 


<p(x,x') = p 0 (x,x) + / dzK 1 (x,x',z)ip(x,z) 

Jfl 

cpo(x, x) = ) + ^ry X M ( x '’ yfe)M w u 2 (x, y/) 


(17) 


(18) 


kl 


is the direct connection term, renormalized to include the effects of the long-distance con¬ 
nection, and the kernel becomes 


Ki(x,x',z) = 


l _ S{x - Z) _ ^2 $(y m - z) 


p{x) 


p(ym ) 


K{ b \x',z) 


(19) 


with 


Af>( X ',z)= P(Z 


U(x') [ 


(x, z) + X u ( x '> yk)M k iu(yi, z) 


kl 


( 20 ) 


is a ‘bare’ kernel (in the absence of any delta function constraints). It is worthwhile to 
note that the the dominant contribution to (p(x, x 7 ) comes from c/?o( x i x/ ) hr Eq. 17 (with 
higher order contributions coming from integrals weighted by u ). If long-range interactions 
are expected to be perturbative and local interactions are very short-ranged, v 3 o( x , x 7 ) may 
be a sufficient approximation to avoid the complexity of an exact evaluation of </?(x, x 7 ). 
However, the undesirable behavior of the linearized importance, pictured in Fig. [8] of the 
main text, suggests that this further-approximated value of ip(x,x') should be used with 
caution. 

The most straightforward approach to solving Eq. [9] of the main text is the development 
of a Neumann series, and it is simple to show that defining <p(x, x 7 ) = YlnL o fn(x , x 7 ) with 
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</? n (x, x 7 ) = f dzK n (x, x', z)cpo(x, z) for n > 1 and K n (x, x', z) = J dz'K\{x, x 1 , z / )i ; C n _i(x, z', z). 
The delta function constraints make this form of the kernel difficult to work with, but 
it is possible to reduce K n (x, x', z) in terms of the more easily computed Kn\x’,z) = 
f dzK[ b \x',z')K^‘}_ 1 (z',z). It is not difficult to show that for a given location x and 
known {y/} that the full kernel has the form 



with large and unwieldy recursive relations defining the coefficients b m , c m and d m i, each 
of which depend on x and {y/}. Importantly, though, the coefficients do not depend on z, 
the variable of integration in the definition of p n (x, x 7 ). It is therefore in principle possible 
to determine </?(x, x') exactly with knowledge of Kn (x. x'), which can be determined using 
standard methods. However, as noted in SI Sec. [Cj integrals over 1/U(x') are expected to 
be very unwieldy, and numerical work is likely required to determine the propagator. 


C Determining U(x) for a simple interaction 

As long as our domain is finite, material nodes on the boundary of the domain will be less 
connected than material points in the interior. This will be reflected in a non-constant 
U(x), which gives rise to a variable importance of the material nodes. Computing U(x) is 
not necessarily trivial. If we suppose that u(x,x') = rt(|x — x'|) and that the domain is a 
sphere of radius R with a constant node density p(x) = H -1 , we find 

u (x) =-^ J^d 3 x'u(\x-x'\) =-^ j ^-^e* kx u(|k|)^d 3 ye"* k ' y 

- f ^ ( sin (A :R) — kRcos(kR) \ sin (kx), 

Z7T k A x \ J 

where ■u(k) is the Fourier Transform of u(x). As a specific example, if u(|x —y|) = e - ^ x-y l, 
with u{k) = 87r l/(k 2 + 1 2 ) 2 in three dimensions, we find 

6 f \ 

U(x) = ,, ( cosh(/x)[^x + 1 2 xR\ — sinh (lr)[l 2 R 2 — 3 IR — 3] ), (24) 

R 6 l 6 R 6 rx \ J 

yielding an expression for the total strength at the material node x expressed in terms of 
elementary functions only. Other forms of the interaction strength (for example, a gaussian) 
tend to produce more complex expressions, and we choose to use only an exponential decay 



( 22 ) 

(23) 
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for the interaction strength between material nodes. Regardless of the (relative) simplicity 
of Eq. 24 terms involving integrals of l/U(x') will be difficult to evaluate exactly (as the 
reader may readily verify), meaning that the Neumann series |52ll53| described in appendix 
[B] is not easily computed analytically. 


D The Nonlinear continuum model 


The development of the continuum model for the nonlinear GENs follows the construction 
in Appendix [A] in a straightforward fashion. The discrete case of the GENs constrain 
E tl = 0, but in the continuum limit we will have a self-energy contribution (similar to 
that of the self-interaction due to u(xi,Xi) 7 ^ 0 above). The 5-function constraints for the 
nonlinear GENs is most easily developed by writing 


U, 


E(xi,Xj 


. 2 (x.x,) + £ 


u 2 (xj,xi) 


j' E(x ? ;,x;)n(x i ,x ; ) + 1 
n 2 (xj,Xj) u 2 (x j} Xj) 


E(x i: Xj)n(xi, Xj) + 1 E(xj, Xj)n(xj, Xj) + 1 


(25) 


where the first term on the right hand side correctly handles the direct connection between 
nodes i and j; the second term is the sum over all nodes (including i and j); the third 
term removes the contribution due to a non-zero E(xj,Xj); and the fourth term removes 
the contribution due to the direct connection between the nodes at Xj and xvj. It is not 
difficult to see that in the continuum limit, this expression becomes 


U(x')-u(x,x') u(x',x') 

E(x,x') E(x, x')u(x, x') + 1 


after recalling that Uj = U(xj ) — u(xj,Xj). The second term on the left hand side of 
the equation is due to the removal of the self-loop. It is clear that Eq. [26] can not be 
expressed in terms of any standard integral equation, and the nonlinearity guarantees that 
analytic work is unlikely to be fruitful without further approximation. If we assume that 
E(x, x’) 3> tt -1 (x, x') for all x and x' and define ip(x, x' ) ~ E^ 1 (x, x'), it is straightforward 
to see that 


= u 2 (x, x')+ 


dz p(z) 


u 2 (x’, z) 


u 2 (x, x') 


E(x, x)u(x, x') + 1 


E(x, z)u(x', z) + 1 

(26) 


U(x')ip(x,x') = u 2 (x, x') + J dz tp(x, z)u(x', z) yp(z) — 5(x — z)j , (27) 

which is Eq. [ 8 ] of the main text. The fact that Eq. [ 8 ] of the main text is the well known 
inhomogeneous Fredholm Equation(52) JS] means that analytic work is possible using the 
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linearized importance. The linearized importance was discussed in more detail in Appendix 

m 

In the continuum model, the long range connections between nodes at the specified points 
{yz} must be enforced by delta-function constraints (since the material nodes an infinitesi¬ 
mal distance away from y j will not have a long-range connection), much like the self-loops 
are removed. The closeness between material nodes, E(xj, Xj), is still expected to be a con¬ 
tinuous function, but we must also account for the long-range connections between network 
nodes. Because Eq. [I] of the main text determines the closeness felt by a network node y j 
towards a material node Xj in terms of the direct connection to other material nodes {yz} 
(with strength Wjf) and the closeness felt by yz towards x ? ;. When computing the closeness 
felt by any node towards a material node, we do not need to compute the closeness felt by 
any node towards a network node. It suffices to compute the material-material closeness 
E(xj,x ? ) as well as the network-material closeness (x,;, y^). In the discrete case, 

this labeling is irrelevant, but it is essential in the continuous case. We then write 


U{yjj) - u(xj,x.j) 
E'(x i ,x j ) 


■u 2 ( x i,Xj)+ ^ 


x z£{yz> 


u 2 (xj,xz) 

E(xj, xz)u(xj, xz) + 1 


+£ 


_ u 2 (xj, yz)_ 

E(n^m) ( X j, Xz)u( X j, Xz) + 1 


(28) 


and 


U(yi) + Wj - u(yj,yj) 
E(“)(xi, yj ) 


= « 2 (x. t ,yj)+ ^2 




-E(x*,xz)u(y j ,xz) + 1 

x z£{yz} 


+ E 

yi¥=yj 


u 2 {yj:yi ) 


£ i (n->-m)(x i) yi)u(yj,yi) + 1 


(29) 


While we have clearly delineated the distinction between E(x, x 7 ) and i?( n_5,m ^(x, y 7 ), there 
is no approximation here and Eqs. 28 and 29 are identical to Eq. [l] of the main text. In 
the continuum limit, only the sum over the material nodes can be converted to an integral, 
as the number of network nodes N remain fixed. We can then write 


t/(x 7 ) - u(x 7 ,x 7 ) 

£(x,x 7 ) 


+ 


u 2 (x 7 ,x 7 ) 


= u 


E(x, x 7 )u(x 7 , x 7 ) + 1 

u 2 (x 7 , z) 


: (x,x 7 ) +Y 


« 2 (x 7 ,yz) 


+ dz 


E(x., z)u(x 7 , z) + 1 


Y E n ^ m (x, yz)u(x 7 , yz) + 1 

p ( z ) - s(x - z) - Y Hyi - z ) 


(30) 
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and 


u{yj) - u{yj,yj) + Wj 

^ ( “)(x, yi ) 


+ 


! (yj.yj) 


+ [«(yi,y/) + wji] 2 


= u 2 (x,yj) 


E r - 


l (x,yi)[u{yj,yi) +wji] + 1 


+ dz 


u 2 (yj,z) 


E(x, z)u(yj,z) + 1 


p(z) - S(x - z) - Y S(yi ~ z) 


(31) 


The full nonlinear version of the continuum limit is again clearly not analytically ap¬ 
proachable, but the linearized version can be reduced to a single Fredholm equation, as is 
discussed further in Appendix |Bj If desired, the closeness felt by a material node towards 
a network node can be similarly computed (after defining a material-to-network closeness 
_g(m—m)(y i , x )) ) which we do not explicitly compute here. 


E Convergence of the Neumann Series 


It is important to note that if u(x, x r ) = it(|x — x'j), which is a natural choice for the 
form of the location dependence of the short range interactions, it is unlikely that the 
Neumann series will converge in an unbounded domain. The convergence of the series 
En An*’* ( x , x/ ) is dependent on the fact that the I 2 norm of the kernel is bounded over the 
domain. For constant p and in an unbounded domain U(pd) = 17 is a constant for all x 
(since every node is connected in the same way to the other nodes in the infinite space) 
and it is straightforward to show that the h norm is 


dx.'dzp(x.')p(z) 


K[ b \x.',z) 


= j dzdx.'u(\x.' — z|) — > 00 


(32) 


after a change of variables to z' = z-x' (valid due to the domain’s infinite extent). For any 
infinite domain, then, <p^(x,x') is undefined. With the ^-function constraints included, 
there will not be a divergence in ip, but the sum p(x,x') = J2 n <p n ( x , x ') will be weakly 
converging (e.g. a divergent sequence of alternating sign), and while theoretically exact 
will likely not be useful if numerical work is required. These issues may all be avoided if 
the domain is finite with volume V, at the cost of a more complex form for U(x ) (described 
in SI Sec. [C|. 
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